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We analyze the entire publication database of the American Physical Society generating longitudinal 
(50 years) citation networks geolocalized at the level of single urban areas. We define the knowledge 
diffusion proxy, and scientific production ranking algorithms to capture the spatio-temporal dynamics of 
Physics knowledge worldwide. By using the knowledge diffusion proxy we identify the key cities in the 
production and consumption of knowledge in Physics as a function of time. The results from the scientific 
production ranking algorithm allow us to characterize the top cities for scholarly research in Physics. 
Although we focus on a single dataset concerning a specific field, the methodology presented here opens the 
path to comparative studies of the dynamics of knowledge across disciplines and research areas. 
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Over the last decade, the digitalization of publication datasets has propelled bibliographic studies allowing 
for the first time access to the geospatial distribution of millions of publications, and citations at different 
granularities 1 " 8 (see Ref. 9 for a review). More precisely, authors' name, affiliations, addresses, and 
references can be aggregated at different scales, and used to characterize publications and citations patterns of 
single papers 1011 , journals 1213 , authors 14 " 16 , institutions 17 , cities 18 , or countries 19 . The sheer size of the datasets 
allows also system level analysis on research production and consumption 20 , migration of authors 21 ' 22 , and change 
in production in several regions of the world as a function of time 5,6 , just to name a few examples. At the same time 
those analyses have spurred an intense research activity aimed at defining metrics able to capture the importance/ 
ranking of authors, institutions, or even entire countries 141517 ' 23 " 29 . Whereas such large datasets are extremely 
useful in understanding scholarly networks and in charting the creation of knowledge, they are also pointing out 
the limits of our conceptual and modeling frameworks 30 and call for a deeper understanding of the dynamics 
ruling the diffusion and fruition of knowledge across the the social and geographical space. 

In this paper we study citation patterns of articles published in the American Physical Society (APS) journals in 
a fifty-year time interval ( 1960-2009) 31 . Although in the early years of this period the dataset was obviously biased 
toward the scholarly activity within the USA, in the last twenty years only about 35% of the papers are produced in 
the USA. The same amount of production has been observed in databases that include multiple journals, and 
disciplines 719 . Indeed the journals of the APS are considered worldwide as reference publication venues that well 
represent the international research activity in Physics. Furthermore this dataset does not bundle different 
disciplines and publication languages, providing a homogeneous dataset concerning Physics scholarly research. 
For each paper we geolocalize the institutions contained in the authors' affiliations. In this way we are able to 
associate each paper in the database with specific urban areas. This defines a time resolved, geolocalized citation 
network including 2,307 cities around the world engaged in the production of scholarly work in the area of 
Physics. Following previous works 817 we assume that the number of given or received citations is a proxy of 
knowledge consumption or production, respectively. More precisely, we assume that citations are the currency 
traded between parties in the knowledge exchange. Nodes that receive citations export their knowledge to others. 
Nodes that cite other works, import knowledge from others. According to this assumption we classify nodes 
considering the unbalance in their trade. Knowledge producers are nodes that are cited (export) more than they 
cite (import). On the contrary, we label as consumers nodes that cite (import) more than they are cited (export). 
Using this classification, we define the knowledge diffusion proxy algorithm to explore how scientific knowledge 
flows from producers to consumers. This tool explicitly assumes a systemic perspective of knowledge diffusion, 
highlighting the global structure of scientific production and consumption in Physics. 

The temporal analysis reveals interesting patterns and the progressive derealization of knowledge producers. 
In particular, we find that in the last twenty years the geographical distribution of knowledge production has 
drastically changed. A paramount example is the transition in the USA from a knowledge production localized 
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around major urban areas in the east and west coast to a broad 
geographical distribution where a significant part of the knowledge 
production is now occurring also in the midwestern and southern 
states in the USA. Analogously, we observe the early 90s dominance 
of UK and Northern Europe to subside to an increase of production 
from France, Italy and several regions of Spain. Interestingly, the last 
decade shows that several of China's urban areas are emerging as the 
largest knowledge consumers worldwide. The reasons underlying 
this phenomenon may be related to the significant growth of the 
economy and the research/development compartment in China in 
the early 21 th century 32 . This positive stimulus, pushed up also the 
scientific consumption with a large number of paper citing work 
from other world areas. Indeed, the increase of publications is assoc- 
iated to an increase of the citations unbalance, moving China to the 
top rank as consumers since the recent influx of its new papers has 
not yet had the time to accumulate citations. 

Although the knowledge diffusion proxy provides a measure of 
knowledge production and consumption, it may be inadequate in 
providing a rank of the most authoritative cities for Physics research. 
Indeed, a key issue in appropriately ranking the knowledge produc- 
tion, is that not all citations have the same weight. Citations coming 
from authoritative nodes are heavier than others coming from less 
important nodes, thus defining a recursive diffusion of ranking of 
nodes in the citation network. In order to include this element in the 
ranking of cities we propose the scientific production ranking algo- 
rithm. This tool, inspired by the PageRank 33 , allows us to define the 
rank of each node, as function of time, going beyond the knowledge 
diffusion proxy or simple local measures such as citation counts or h- 
index 14 . In this algorithm the importance of each node diffuses 
through the citation links. The rank of a node is determined by the 
rank of the nodes that cite it, recursively, thus implicitly weighting 
differently citations from highly (lowly) ranked nodes. Also in this 
case we observe noticeable changes in the ranking of cities along the 
years. For instance the presence of both European and Asian cities in 
the top 100 list increases by 50% in the last 20 years. This findings 
suggest that the Internet, digitalization and accessibility of publica- 
tions are creating a more levelled playing field where the dominance 



Paper A 



Paper B 



xxx university, Ann Arbor, Michigan 
yyy university, Los Alamos, New Mexico 
zzz university, New York, New York 




university of aaa, Madrid, Spain 
bbb center, Rome, Italy 



Paper C 



university of ccc, Oxford, United Kingdom 
ddd institute, Princeton, New Jersey 




Figure 1 | Projecting a paper citation relationship into a city-to-city 
citation network. (A) Paper A written by authors from Ann Arbor , Los 
Alamos and New York cites one paper B written by authors from Rome and 
Madrid and another paper Cfrom Oxford and Princeton. (B) In a city-to- 
city citation network, directed links from Ann Arbor to Madrid, Rome, 
Oxford and Princeton are generated, and similarly Los Alamos and New 
York are connected to the above four cited cities. 



of specific area of the world is being progressively eroded to the 
advantage of a more widespread and complex knowledge production 
and consumption dynamic. 

Results 

We focus our analysis on the APS dataset 31 . It contains all the papers 
published by the APS from 1893 to 2009. We consider only the last 
50 years due to the incomplete geolocalization information available 
for the early years. During this period, the large majority of indexed 
papers, 97.47%, contain complete information such as authors name, 
journal of publication, day of publication, list of affiliations and list of 
citations to other articles published in APS journals. We geolocalized 
96.97% of papers at urban area level with an accuracy of 98.5%. We 
refer the reader to the Methods section and to the Supplementary 
Information (SI) for the detailed description of the dataset and the 
techniques developed to geolocalize the affiliations. 

In total, only 43% of papers has been produced inside the USA. 
Interestingly, over time this fraction has decreased. For example, in 
the 60's it was 85.59%, while in the last 10 years decreased to just 
36.67%. While one might assume that the APS dataset is biased 
toward the USA scientific community, the percentage of publications 
contributed by the USA in APS journals after 1990 is almost the same 
as in other publication datasets 7 ' 19 . These alternative datasets contain 
journals published all over the world and mix different scientific 
disciplines. This supports the idea that the APS journals are now 
attracting the worldwide physics scientific community indepen- 
dently of nationality, and fairly represent the world production 
and consumption of Physics. It is not possible to provide quantitative 
analysis of possible nationality bias and disentangle it by an actual 
change of the dynamic of knowledge production. For this reason, and 
in order to minimize any bias in the analysis we focus our analysis in 
the last 20 years of data. 

In order to construct the geolocalized citation network we con- 
sider nodes (urban areas) and directed links representing the pre- 
sence of citations from a paper with affiliation in one urban area to a 
paper with affiliation in another urban area. For example, if a paper 
written in node i cites one paper written in node; there is an link from 
i to j, i.e., j receives a citation from i and i sends a citation to j. Each 
paper may have multiple affiliations and therefore citations have to 
be proportionally distributed between all the nodes of the papers. For 
this reason we weight each link in order to take into account the 
presence of multiple affiliations and multiple citations. In a given 
time window, the total number of citations for papers written in j 
received from papers written in i, is the weight of the link i —> j, and 
the total number of citations for those paper written in j sent to the 
papers written in k is the weight of the link; — > k. For instance, if in a 
time window t, there is one paper written in node j, which cite two 
papers written in node k and was cited by three papers written in 
node i, then w jk = 2, = 3, and we add all such weights for each 
paper written in that node j and obtain the weights for links. For 
papers written in multiple cities, say jx, j 2 , the weight will be counted 
equally. The time window we use in this manuscript is one year. We 
show an example of the network construction in Figure 1. 

In order to define main actors in the production and consumption 
of Physics, we consider citations as a currency of trade. This analogy 
allows us to immediately grasp the meaning and distinction between 
producers and consumers of scientific knowledge. Nodes that receive 
citations export their knowledge to the citing nodes. Instead, nodes 
that cite, papers produced from other nodes of the network, import 
knowledge from the cited nodes. Measuring the unbalance trade 
between citations, we define producers as cities that export more than 
they import, and consumers as cities that import more than they 
export. More precisely, we can measure the total knowledge 
imported by each urban area as ^ Wy and the total export as 
J2j w ji m a g iven y ear - Those measures however acquire specific 
meaning when considered relatively to the total trade of physics 



SCIENTIFIC REPORTS | 3 : 1640 | DOI: 1 0.1 038/srep01 640 



2 



knowledge worldwide in the same year; i.e. the total number of cita- 
tions worldwide S= Ylij w iy The relative trade unbalance of each 
urban area i is then: 



ASi-- 



(1) 



A negative or positive value of this quantity indicates if the urban area 
i is consumer or producer, respectively. In Figure 2- A we show 
the worldwide geographical distribution of producer (red) and 
consumer (blue) urban areas for the 1990 and 2009. Interestingly, 
during the 90s the production of Physics knowledge was highly loca- 
lized in a few cities in the eastern and western coasts of the USA and 
in a few areas of Great Britain and Northern Europe. In 2009 the 



picture is completely different with many producer cities in central 
and southern parts of the USA, Europe and Japan. It is interesting to 
note that despite the fraction of papers produced in the USA is 
generally decreasing or stable, many more cities in the USA acquire 
the status of knowledge producers. This implies that the quality of 
knowledge production from the USA is increasing and thus attract- 
ing more citations. This makes it clear that the knowledge produced 
by an urban area can not be considered to be measured only by the 
raw number of papers. Citations are a more appropriate proxy that 
encodes the value of the products. They serve as an approximation of 
the actual flow of knowledge. The Figure 2- A also makes it clear that 
cities in China are playing the role of major consumers in both 1990 
and 2009. We also observe that cities in other countries like Russia 




Figure 2 | Spatial distributions of scientific producers and consumers of Physics. The geospatial distribution of scientific producer and consumer cities. 
(A) The world map of producers and consumers at the city level in 1990 (top) and 2009 (bottom). A producer city, of which the relative unbalance A S t > 0, 
is coloured in red scale. A consumer with the relative unbalance AS{ < 0 is coloured in blue scale. The darkness of colour is proportional to the absolute 
value of unbalance. The larger the absolute value of unbalance, the darker the colour. (B) The map of producer and consumer cities in the continental 
United States in 1990 (left) and 2009 (right). (C) The map of producer and consumer cities in selected European countries in 1990 (left) and 2009 (right). 
In (B) and (C), a producer city is marked with a red bar, while a consumer city is marked with a blue bar. The height of each bar is scaled with I ASj\ . Note 
that in (C) the height of bars is not scaled with the height in (B) for visibility. Maps in panel (A) are created by using ArcGIS® 47 , and maps in panel (B) and 
(C) are created by using R 48 . 
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Figure 3 | Networks structure. The network structures of city-to-city citation networks. (A) The backbones (a = 0.1) of the citation networks at the city 
level within the United States in 1960, 1990, 2009 (from the left to right). (B) The backbones (a = 1, 0.1, 0.1 from left to right) of the citation networks at 
the city level within the European Union 27 countries as well as Switzerland and Norway in 1960, 1990, 2009 (from the left to right). In (A) and (B), the 
color shows the direction of links: if node i cites node j there is a link starting with blue and ending with red. (C) The cumulative distribution function of 
the link weights F w (Wy) = P(w> Wjj) for the city-to-city citation networks in year 1960, 1990 and 2009 (from left to right). The maps of networks in (A) 
and (B) were created using JFlowMap 40 . 



and India consumed less in 2009 than 1990. In other words, in 2009 
both the production and consumption of knowledge are less con- 
centrated on specific places and generally spread more evenly geo- 
graphically. In order to provide visual support to this conclusion we 
show in Figure 2-B the geographical distribution of producers and 
consumers inside the USA. From the two maps it is evident the drift 
of knowledge production from the two coastal areas in the USA to the 
midwest, central and southern states. Similarly, in Figure 2-C we plot 
the same information for western Europe. In 1990 only a few urban 
areas in Germany and France were clearly producers. By 2009 this 
dominance has been consistently eroded by Italy, Spain and a more 
widespread geographical distribution of producers in France, 
Germany and UK. 

Knowledge diffusion proxy. The definition of producers and con- 
sumers is based on a local measure, that does not allow to capture all 
possible correlations and bounds between nodes that are not directly 
connected. This might result in a partial view and description of the 
system, especially when connectivity patterns are complex 34 " 38 . 
Interestingly, a close analysis of each citation network, see 
Figure 3, clearly shows that citation patterns have indeed all the 
hallmarks of complex systems 34 " 38 , especially in the last two 
decades. The system is self- organized, there is not a central 
authority that assigns citations and papers to cities, there is not a 
blueprint of system's interactions, and as clearly shown from 
Figure 3-C the statistical characteristics of the system are described 
by heavy-tailed distributions 34 " 38 . Not surprisingly, the level of 
complexity of the system has increased with time. In Figure 3- A 
we plot the most statistically significant connections of the citation 
network between cities inside the USA in 1960, 1990 and 2009. We 
filter links by using the backbone extraction algorithm 39 which 



preserves the relevant connections of weighted networks while 
removing the least statistically significant ones. We visualize each 
filtered network by using a bundled representation of links 40 . The 
direction of each weighted link goes from blue (citing) to red (cited). 
Similarly, in Figure 3-B, we visualize the most significant links 
between cities in Europe (European Union's 27 countries, as well 
as Switzerland and Norway). It is clear from Figure 3- A that in 
1960 the citation patterns inside the USA were limited to a few 
cities, and in Europe only a few cities were connected. Instead, in 
1990 and 2009 we register an increase in the interactions among a 
larger number of cities. The observed temporal trend is well known 
and valid not just for Physics 41 . Among many factors that have been 
advocated to explain this tendency we find the increase of the 
research system and the advance in technology that make 
collaboration and publishing easier 20 ' 42 " 44 . 

In order to explicitly consider the complex flow of citations 
between producers and consumers, we propose the knowledge dif- 
fusion proxy algorithm (see Methods section for the formal defini- 
tion). In this algorithm, producers inject citations in the system that 
flow along the edges of the network to finally reach consumer cities 
where the injected citations are finally absorbed. The algorithm 
allows charting the diffusion of knowledge, going beyond local mea- 
sures. The entire topology of the networks is explored uncovering 
nontrivial correlations induced by global citation patterns. For 
instance, knowledge produced in a city may be consumed by another 
producer that in turn produces knowledge for other cities who are 
consumers. This points out that the actual consumer of knowledge is 
not just signalled by the unbalance of citations but in the overall 
topology of the production and consumption of knowledge in the 
whole network. Indeed, the final consumer of each injected citation 
may not be directly connected with the producer. Citations flow 
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Table 1 | Rankings from Knowledge diffusion proxy algorithm for top 3 producer cities in 2009. In bold, we highlight cities that are present in 
top 1 0 consumers ranked according to the knowledge diffusion proxy but do not appear in top 1 0 cities ranked according to local citation 
unbalance 

Boston Berkeley New Haven 

Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance Diffusion proxy Citation unbalance 



Athens 


Madrid 


Madrid 


Athens 


Vancouver 


Vancouver 


Gwangju 


Moscow 


Bratislava 


Paris 


Berlin 


Tokyo 


Trieste 


Trieste 


Mainz 


Beijing 


Paris 


Berlin 


Waco 


Gwangju 



Athens 


Athens 


Gwangju 


Madrid 


Bratislava 


Bratislava 


Madrid 


Paris 


Vancouver 


Vancouver 


Trieste 


Gwangju 


Waco 


Moscow 


Paris 


Trieste 


Berlin 


Seoul 


Mainz 


Waco 



Berlin 


Vancouver 


Athens 


Paris 


Mainz 


Trieste 


Vancouver 


Athens 


Gwangju 


Gwangju 


Trieste 


Bratislava 


Bratislava 


Madrid 


Coventry 


Liverpool 


Valencia 


Oxford 


Madrid 


Santa Barbara 



along all possible paths, sometimes through intermediate cities. In 
Table 1, and Table 2 we report the rankings of Top 10 final consu- 
mers evaluated by the knowledge diffusion proxy for the Top 3 
producers in 2009 and 1990 respectively. We also list the Top 10 
neighbours according to the local citation unbalance. From these two 
tables, it is clear that the final rank of each consumer, obtained by our 
algorithm, can be extremely different from the ranking obtained by 
just considering local unbalances. For instance, in 2009 Bratislava 
and Mainz rank in top 10 consumers absorbing knowledge produced 
in Boston. However, according to local measure of unbalance, these 
two cities are ranked out of top 10 (shown in bold in Table 1). 
Interestingly, even the Top consumer for New Haven, Berlin, also 
does not rank among the Top 10 neighbours according to the citation 
unbalance. These findings confirm that in order to uncover the com- 
plex set of relationships among cities, it is crucial to consider the 
entire structure of the network, going beyond simple local measures. 

In Figure 4-A and Figure 4-B we visualize the results considering 
the Top four producer cities in 2009 in the USA and in Europe 
respectively. We show their Top ten consumers over 20 years as 
function of time. The size of each circle is proportional to how many 
times each injected citation is absorbed by that consumer. In the plot, 
vertical grey strips indicate that the city was not a producer during 
those years (e.g. Orsay in 2008). The results show that, on average, 
Beijing is the top consumer for all of these producers in the past 
20 years. Since China registered a big economical growth and incre- 
ment of research population in the early 2000, it is reasonable to 
assume that, thanks to this positive stimulus, many more papers were 
written in its capital, a dominant city for scientific research in China. 
However, the fast publication growth increased the unbalance 
between sent and received citations. Each paper published in a given 



city imports knowledge from the cited cities. Reaching a balance 
might require some time. Each city needs to accumulate citations 
back to export its knowledge to others cities. We can speculate that in 
the near future cities in China might be moving among the strongest 
producers if a fair number of papers start receiving enough citations, 
which obviously depends on the quality of the research carried out in 
the last years. This is the case of cities like Tokyo which has gradually 
approached the citation balance in recent years. For instance, Table 2 
shows that in 1990 Tokyo, was among the top consumers. But by 
2009, its contribution to citation consumption had become less 
significant as observed from Figure 4 and Table 1. 

Ranking cities. Authors, departments, institutions, government and 
many funding agencies are extremely interested in defining the most 
important sources of knowledge. The necessity to find objective 
measures of the importance of papers, authors, journals, and 
disciplines leads to the definition of a wide variety of rankings 23 ' 24 . 
Measures such as impact factor, number of citations and h-index 14 
are commonly used to assess the importance of scientific production. 
However, these common indicators might fail to account for the 
actual importance and prestige associated to each publication. In 
order to overcome these limitations, many different measures have 
been proposed 25 " 28 . Here we introduce the scientific production 
ranking algorithm (SPR), an iterative algorithm based on the 
notion of diffusing scientific credits. It is analogous to PageRank 33 , 
CiteRank 26 , HITS 25 , SARA 29 , and others ranking metrics. In the 
algorithm each node receives a credit that is redistributed to its 
neighbours at the next iteration until the process converges in a 
stationary distribution of credit to all nodes (see Methods section 
for the formal definition). The credits diffuse following citations links 



Table 2 Rankings from Knowledge diffusion proxy algorithm for top 3 producer cities in 1990. In bold, we highlight cities that are present in 
top 1 0 consumers ranked according to the knowledge diffusion proxy but do not appear in top 1 0 cities ranked according to local citation 
unbalance 


Piscataway 




Boston 




Palo Alto 


Diffusion proxy 


Citation unbalance 


Diffusion proxy 


Citation unbalance 


Diffusion proxy 


Citation unbalance 


Tokyo 

Beijing 

Tsukuba 

Grenoble 

Tallahassee 

Hamilton 

Buffalo 

Vancouver 

Charlottesville 

Tempe 


Stuttgart 

Tokyo 

Los Angeles 

Urbana 

College Park 

Grenoble 

Rochester 

Boston 

Los Alamos 

Hamilton 


Tokyo 

Grenoble 

Beijing 

Tsukuba 

Seoul 

Vancouver 

Tallahassee 

Warsaw 

Kolkata 

Charlottesville 


Tokyo 

Grenoble 

Los Angeles 

College Park 

Los Alamos 

Urbana 

Boulder 

Rochester 

Vancouver 

Bloomington 


Tokyo 

Beijing 

Tsukuba 

Seoul 

Tallahassee 

Charlottesville 

Vancouver 

Berlin 

Durham 

Taipei 


Tokyo 

Ann Arbor 

Bloomington 

Boulder 

Urbana 

Berlin 

Orsay 

Denver 

Seoul 

Los Alamos 
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Figure 4 | Knowledge diffusion proxy results. (A) The Top 4 producer cities in the USA in 2009 and their Top 10 consumers from knowledge diffusion 
proxy algorithm in 1990 - 2009. (B) The Top 4 producer cities in the European Union 27 countries as well as Switzerland and Norway in 2009 and their 
Top 10 consumers from knowledge diffusion proxy algorithm in 1990 - 2009. When a producer city becomes a consumer in some year, a grey strip is 
marked in that year. For each producer city in (A) and (B), the major consumers of the first producer city m in 20 years are plotted as a function of time 
from 1990 to 2009. The size of the bubble in position ( Y, c) is also proportional to the counter g m>c ( Y) in that year. The consumer cities for each producer 
are ordered according to the total number of counters in 20 years, i.e., Ey^Wt^)- 



self- consistently, implying that not all links have the same 
importance. Any city in the network will be more prominent in 
rank if it receives citations from high- rank sources. This process 
ensures that the rank of each city is self- consistently determined 
not just by the raw number of citations but also if the citations 
come from highly ranked cities. In Figure 5 we show the Top 20 
cities from 1990 to 2009. Interestingly, we clearly see the decline 
and rise of cities along the years as well as the steady leadership of 
Boston and Berkeley. This behaviour is clear in Figure 6-B where we 
show the rank for cities in USA in 1990 and 2009. Meanwhile, the 
ranking of cities in European and Asian countries like France, Italy 
and Japan has increased significantly, as shown in both Figure 5 and 
Figure 6-A. In Figure 6-C we focus on the geographical distribution 
of ranks for a selected set of European countries in 1990 and 2009. In 
Table 3 we provide a quantitative measure of the change in the 
landscape of the most highly ranked cities in the world by showing 



the percentage of cities in the top 100 ranks for different continents. 
In Figure 7, we compare the ranking obtained by our recursive 
algorithm with the ranking obtained by considering the total 
volume of publications produced in each city. Since we are 
considering only journals by the APS, the impact factor is 
consistent across all cities and does not include disproportionate 
effects that often happen when mixing disciplines or journal with 
varied readership. It is then natural to consider a ranking based on 
the raw productivity of each place. As we see in the figure though the 
two rankings, although obviously correlated, provide different 
results. A number of cities whose ranking, according to produc- 
tivity, is in the Top 20 cities in the world, are ranked one order of 
magnitude lower by the SPR algorithm. Valuing the number of 
citations and their origin in the ranking of cities produces results 
often not consistent with the raw number of papers, signaling that in 
some places a large fraction of papers are not producing knowledge 



SCIENTIFIC REPORTS | 3 : 1640 | DOI: 1 0.1 038/srep01 640 



6 



rank 1990 




Figure 5 | Top 20 ranked cities as a function of time. The plot summarizes Top 20 ranked cities in 1990, 1995, 2000, 2005 and 2009 (from left to right), 
and relations between the rankings in different years. The grey lines are used when the rank of that city drops out of Top 20. 



as they are not cited. We believe that the present algorithm may be 
considered as an appropriate way to rank scientific production taking 
properly into account the impact of papers as measured by citations. 

Discussion 

In this paper we study the scientific knowledge flows among cities as 
measured by papers and citations contained in APS 31 journals. In 
order to make clear the meaning and difference between producers 
and consumers in the context of knowledge, we propose an econom- 
ical analogy referring to citations as a traded currency between urban 
areas. We then study the flow of citations from producers to con- 
sumers with the knowledge production proxy algorithm. Finally, we 
rank the importance of cities as function of time using the scien- 
tific production ranking algorithm. This method, inspired by the 



PageRank 33 , allows us to evaluate the importance of cities explicitly 
considering the complex nature of citation patterns. In our analysis 
we considered just scientific publications contained in the APS 
journals 31 . We do not have information on citations received or 
assigned to papers outside this dataset. These limitations certainly 
affect the count of citations of each city, potentially creating biases in 
our results. However, our findings, while limited to a particular data- 
set, are aligned with different observations reported by other studies 
focused on other datasets and fields. For example, we identify major 
US cities (e.g. Boston and San Francisco areas), as the most important 
sources of Physics. Similar observations have been done by Borner 
et aV 7 at the institution level considering papers published in the 
Proceedings of the National Academy of Sciences, by Mazloumian 
et al 8 at country and city level with Web of Science dataset, and by 




Figure 6 | Geospatial distribution of city ranks. (A) The world map of city ranks in 1990 (left) and 2009 (right). The ranking of each city is represented by 
color from blue (high ranks) to white (lowranks). (B) The map ofranksfor cities in the United States in 1990 (left) and2009 (right). (C) The map ofranks 
for cities in the selected European countries in 1990 (left) and 2009 (right). In (B) and (C), each city is marked with a bar, and the height of each bar is 
inversely proportional to the ranking position. The Top 3 rank positions in each region are labelled for reference. Note that in (C) the height of bars is not 
scaled with the height in (B) for visibility. Maps in panel (A) are created by using ArcGIS® 47 , and maps in panel (B) and (C) are created by using R 48 . 
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Table 3 Percentage of top 1 00 ranked cities in continents in 1 990 


and 2009 




Continent 


1990 2009 


Asia 


4.0% 1 1 .0% 


Europe 


24.0% 33.0% 


N. America 


72.0% 56.0% 



Batty 4 at both institution and country level considering the Institute 
for Scientific Information (ISI) HighlyCited database. We also find 
that some European, Russian and Japanese cities have gradually 
improved their productivities and ranks in recent twenty years. 
Similarly, such growth in scientific production has been observed 
by King 19 in the ISI database. As discussed in detail in the SI, by 
aggregating citations of cities to their respective countries, we find 
the same correlation between the number of citations, as well as the 
number of papers, and the GDP invested on Research and 
Development of several countries as reported by Pan et al. 7 based 
on the ISI database. This analogy between our results, and many 
others in the literature, suggests that the APS dataset, although lim- 
ited, is representative of the overall scientific production of the largest 
countries and cities in the recent 20 years. The methodology pro- 
posed in this paper could be readily extended to larger datasets for 
which the geolocalization of multiple affiliation is possible. In view of 
the different rate of publications and citations in different scientific 
fields we believe however that the analysis of scientific knowledge 
production should only consider homogeneous datasets. This would 
help the understanding of knowledge flows in different areas and 
identify the hot spot of each discipline worldwide. 

Methods 

Dataset. The dataset of the American Physical Society journals, considering papers 
published between 1893 and 2009 of which 450, 655 papers include a list of 
affiliations 31 . Each of paper may have multiple affiliations. In total there are 945, 767 
affiliation strings. 

In order to geolocalize the articles, we parse the city names from the affiliation 
strings for each article. First, we process each affiliation string and try to match 
country or US state names from a list of known names and their variations in different 
languages. We crosscheck the results with Google Map API obtaining validated 
location information for 97.7% of affiliation strings, corresponding to 445, 223 
articles. It is worth noticing that we do not use Google Map API (or other map APIs 
like Yahoo! or Bing) directly for geocoding because, to our best knowledge, there are 
no accuracy guarantees to these API results. For each affiliation string with an 
extracted country or state name, we also match the city name against GeoName 
database 45 corresponding to its country or US state. 92.6% of affiliation strings with 
extracted city names are subsequently verified with Google Map API. Finally, a total 
of 425, 233 publication articles successfully pass the filters we describe here. 

The dataset also provides 4, 710, 548 records of citations between articles published 
in APS journals. To build citation networks at the city level, we merge the citation 
links from the same source node to the same target node, and put the total citations on 
this link as the weight. For articles with multiple city names, the weight will be equally 
distributed to the links of these nodes. There are totally 2, 765, 565 links for 
city-to-city citation networks from 1960 to 2009. (For the full details of parsing 
country and city names, as well as building networks, see Supplementary Information 
(SI)). 

Knowledge diffusion proxy algorithm. This analysis tool is inspired by the dollar 
experiment, originally developed to characterized the flow of money in economic 
networks 46 . Formally, it is a biased random walk with sources and sinks where a 
citation diffuses in the network. The diffusion takes place on top of the network of net 
trade flows. Let us define Wy- as the number of citation that node i gives to; and Wp as 
the opposite flow. We can define the antisymmetric matrix T y = vv y - - Wp The 
network of the net trade is defined by the matrix F with F y - = | T y -| = 1 2} f | for all 
connected pairs with Ty < 0 and F y - = 0 for all connected pairs with Ty > 0. 
There are two types of nodes. Producers are nodes with a positive trade unbalance 
As,- = sf —s° ut = J2j Fji ~ Ylj Fij- Their strength-in is larger than their strength-out. 
On the other hand, consumers are nodes with a negative unbalance As. On top of this 
network a citation is injected in a producer city. The citation follows the outgoing 
edges with a probability proportional to their intensities, and the probability that the 

citation is absorbed in a consumer city j equals to Ptbs(j) = Asj/ sj". By repeating 
many times this process from each starting point (producers) we can build a matrix 
with elements e y - that measure how many times a citation injected in the producer city 
i is absorbed in a city consumer j. 




• Piscataway 
• Chicago 



ranking based on the number of publications 



Figure 7 | Correlation between scientific production ranking and 
ranking based on the number of publications in 2009. The x-axis 
represents rankings based on the number of papers each city published in 
2009, and the y-axis represents the scientific production ranking for each 
city in 2009. The solid line corresponds to the power-law fitting of data 
with slope -0.98, and separates the space into two regions. In the region 
below the line (coloured blue), cities gain better rankings from scientific 
production ranking algorithm even with relatively less publications, such 
as Chicago and Piscataway. In the region above (coloured green) cities have 
lower rankings from the algorithm even they have more papers published, 
such as Beijing, Berlin, Wako and Shanghai. 

Scientific production ranking algorithm. The scientific production rank is defined 
for each node i according to this self- consistent equation: 

Pi 



= qZi + {l-q) 



(2) 



Pi is the score of the node i, 0 < q < 1 is the damping factor (defining the probability of 
random jumps reaching any other node in the network), Wp is the weight of the 
directed connection from j to i, s° ut is the strength-out of the node j and finally S(x), is 
the Dirac delta function that is 0 for x = 0 and 1 for x = 1. Here we use the damping 
factor q = 0.15. The first term on the r.h.s. of Eq. (2) defines the redistribution of 
credits to all nodes in the network due to the random jumps in the diffusion. The 
second term defines the diffusion of credit through the network. Each node i will get a 
fraction of credit from each citing node j proportional to the ratio of the weight of link 
j — > i and the strength-out of node j. Finally the last term defines the redistribution of 
credits to all the nodes in the networks due to the nodes with zero strength-out. In the 
original PageRank the vector z has all the components equal to I IN (where N is the 
total number of nodes). Each component has the same value because the jumps are 
homogeneous. In this case instead, the vector z considers the normalized scientific 
credit given to the node i based on his productivity. Mathematically we have: 



J2j J2 P "p,/ 



A pA/ n p 



(3) 



where p defines the generic paper and n p the number of nodes who have written the 
paper. It is important to notice that 8 p>i = 1 only if the z'-th node wrote the paper p, 
otherwise it equals zero. 
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